Merged
Conversation
The UTF8String may come from UnsafeRow, then underline buffer of it is not copied, so we should clone it in order to hold it in Stats. cc yhuai Author: Davies Liu <davies@databricks.com> Closes apache#8929 from davies/pushdown_string. (cherry picked from commit ea02e55) Signed-off-by: Yin Huai <yhuai@databricks.com>
In the course of https://issues.apache.org/jira/browse/LEGAL-226 it came to light that the guidance at http://www.apache.org/dev/licensing-howto.html#permissive-deps means that permissively-licensed dependencies has a different interpretation than we (er, I) had been operating under. "pointer ... to the license within the source tree" specifically means a copy of the license within Spark's distribution, whereas at the moment, Spark's LICENSE has a pointer to the project's license in the other project's source tree. The remedy is simply to inline all such license references (i.e. BSD/MIT licenses) or include their text in "licenses" subdirectory and point to that. Along the way, we can also treat other BSD/MIT licenses, whose text has been inlined into LICENSE, in the same way. The LICENSE file can continue to provide a helpful list of BSD/MIT licensed projects and a pointer to their sites. This would be over and above including license text in the distro, which is the essential thing. Author: Sean Owen <sowen@cloudera.com> Closes apache#8919 from srowen/SPARK-10833. (cherry picked from commit bf4199e) Signed-off-by: Sean Owen <sowen@cloudera.com>
…AllocationSuite Fix the following issues in StandaloneDynamicAllocationSuite: 1. It should not assume master and workers start in order 2. It should not assume master and workers get ready at once 3. It should not assume the application is already registered with master after creating SparkContext 4. It should not access Master.app and idToApp which are not thread safe The changes includes: * Use `eventually` to wait until master and workers are ready to fix 1 and 2 * Use `eventually` to wait until the application is registered with master to fix 3 * Use `askWithRetry[MasterStateResponse](RequestMasterState)` to get the application info to fix 4 Author: zsxwing <zsxwing@gmail.com> Closes apache#8914 from zsxwing/fix-StandaloneDynamicAllocationSuite. (cherry picked from commit dba95ea) Signed-off-by: Andrew Or <andrew@databricks.com>
Author: Ryan Williams <ryan.blake.williams@gmail.com> Closes apache#8939 from ryan-williams/errmsg. (cherry picked from commit b7ad54e) Signed-off-by: Andrew Or <andrew@databricks.com>
…Suite Fixed the test failure here: https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ This failure is because `HeartbeatReceiverSuite. heartbeatReceiver` may receive `SparkListenerExecutorAdded("driver")` sent from [LocalBackend](https://github.com/apache/spark/blob/8fb3a65cbb714120d612e58ef9d12b0521a83260/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala#L121). There are other race conditions in `HeartbeatReceiverSuite` because `HeartbeatReceiver.onExecutorAdded` and `HeartbeatReceiver.onExecutorRemoved` are asynchronous. This PR also fixed them. Author: zsxwing <zsxwing@gmail.com> Closes apache#8946 from zsxwing/SPARK-10058. (cherry picked from commit 9b3e776) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with.
Author: felixcheung <felixcheung_m@hotmail.com>
Closes apache#8961 from felixcheung/rselect.
(cherry picked from commit 721e8b5)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
I don't believe the API changed at all. Author: Avrohom Katz <iambpentameter@gmail.com> Closes apache#8957 from akatz/kcl-upgrade. (cherry picked from commit 883bd8f) Signed-off-by: Sean Owen <sowen@cloudera.com>
`Murmur3_x86_32.hashUnsafeWords` only accepts word-aligned bytes, but unsafe array is not. Author: Wenchen Fan <cloud0fan@163.com> Closes apache#8987 from cloud-fan/hash.
This should go into 1.5.2 also. The issue is we were no longer adding the __app__.jar to the system classpath. Author: Thomas Graves <tgraves@staydecay.corp.gq1.yahoo.com> Author: Tom Graves <tgraves@yahoo-inc.com> Closes apache#8959 from tgravescs/SPARK-10901. (cherry picked from commit e978360) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
This PR implements the following features for both `master` and `branch-1.5`. 1. Display the failed output op count in the batch list 2. Display the failure reason of output op in the batch detail page Screenshots: <img width="1356" alt="1" src="https://cloud.githubusercontent.com/assets/1000778/10198387/5b2b97ec-67ce-11e5-81c2-f818b9d2f3ad.png"> <img width="1356" alt="2" src="https://cloud.githubusercontent.com/assets/1000778/10198388/5b76ac14-67ce-11e5-8c8b-de2683c5b485.png"> There are still two remaining problems in the UI. 1. If an output operation doesn't run any spark job, we cannot get the its duration since now it's the sum of all jobs' durations. 2. If an output operation doesn't run any spark job, we cannot get the description since it's the latest job's call site. We need to add new `StreamingListenerEvent` about output operations to fix them. So I'd like to fix them only for `master` in another PR. Author: zsxwing <zsxwing@gmail.com> Closes apache#8950 from zsxwing/batch-failure. (cherry picked from commit ffe6831) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
Currently if it isn't set it scans `/lib/*` and adds every dir to the classpath which makes the env too large and every command called afterwords fails. Author: Kevin Cox <kevincox@kevincox.ca> Closes apache#8994 from kevincox/kevincox-only-add-hive-to-classpath-if-var-is-set.
The created decimal is wrong if using `Decimal(unscaled, precision, scale)` with unscaled > 1e18 and and precision > 18 and scale > 0. This bug exists since the beginning. Author: Davies Liu <davies@databricks.com> Closes apache#9014 from davies/fix_decimal. (cherry picked from commit 37526ac) Signed-off-by: Davies Liu <davies.liu@gmail.com>
…ifferent Oops size.
UnsafeRow contains 3 pieces of information when pointing to some data in memory (an object, a base offset, and length). When the row is serialized with Java/Kryo serialization, the object layout in memory can change if two machines have different pointer width (Oops in JVM).
To reproduce, launch Spark using
MASTER=local-cluster[2,1,1024] bin/spark-shell --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"
And then run the following
scala> sql("select 1 xx").collect()
Author: Reynold Xin <rxin@databricks.com>
Closes apache#9030 from rxin/SPARK-10914.
(cherry picked from commit 84ea287)
Signed-off-by: Reynold Xin <rxin@databricks.com>
…eaming applications Dynamic allocation can be painful for streaming apps and can lose data. Log a warning for streaming applications if dynamic allocation is enabled. Author: Hari Shreedharan <hshreedharan@apache.org> Closes apache#8998 from harishreedharan/ss-log-error and squashes the following commits: 462b264 [Hari Shreedharan] Improve log message. 2733d94 [Hari Shreedharan] Minor change to warning message. eaa48cc [Hari Shreedharan] Log a warning instead of failing the application if dynamic allocation is enabled. 725f090 [Hari Shreedharan] Add config parameter to allow dynamic allocation if the user explicitly sets it. b3f9a95 [Hari Shreedharan] Disable dynamic allocation and kill app if it is enabled. a4a5212 [Hari Shreedharan] [streaming] SPARK-10955. Disable dynamic allocation for Streaming applications. (cherry picked from commit 0984129) Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
…rain with given regParam and convergenceTol parameters These params were being passed into the StreamingLogisticRegressionWithSGD constructor, but not transferred to the call for model training. Same with StreamingLinearRegressionWithSGD. I added the params as named arguments to the call and also fixed the intercept parameter, which was being passed as regularization value. Author: Bryan Cutler <bjcutler@us.ibm.com> Closes apache#9002 from BryanCutler/StreamingSGD-convergenceTol-bug-10959. (cherry picked from commit 5410747) Signed-off-by: Xiangrui Meng <meng@databricks.com>
…n on Aggregate For example, we can write `SELECT MAX(value) FROM src GROUP BY key + 1 ORDER BY key + 1` in PostgreSQL, and we should support this in Spark SQL. Author: Wenchen Fan <cloud0fan@outlook.com> Closes apache#8548 from cloud-fan/support-order-by-non-attribute.
|
Can we also pull this fix? https://issues.apache.org/jira/browse/SPARK-10389 This will fix the 100+ failure we ran into when comparing the native and sparkSQL resutls. Thank you. |
Author
|
Already did. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.